perf: implement batch processing in iterateEvalTree by cheb0 · Pull Request #406 · ozontech/seq-db

cheb0 · 2026-04-22T11:54:11Z

Description

Continuation of #390

iterateEvalTree works with batches of lids, requests batches of mids and rids
fixes stopwatch measurements for get_mid step
array based hist map is decoupled into it's own struct

I did some measurements for both patches (this combined with #390) vs main (used bitpack encoding in both branches). For small ordinary searches there is no benefit. For dense analytic queries there is a decent improvement.

For our k6 benchmark seq-db-hist.js: 2.3 sec => 650 ms
For seq-db-aggs.js: 6.1 sec => 4.7 sec
Hist over _all_ (warm query) (3 prod fractions): ~37 ms => ~15 ms

Part of #329

I have read and followed all requirements in CONTRIBUTING.md;
I used LLM/AI assistance to make this pull request;

cheb0 · 2026-04-22T11:56:09Z

@seqbenchbot up main search-keyword-exact-match-warm

seqbenchbot · 2026-04-22T11:56:11Z

Nice, @cheb0 <(-^,^-)=b!

Your request was successfully served.
Identificator for your ongoing benchmark - e8eefca9.

Here is a list of helpful links:

Take a look at Grafana dashboard;
Live-tailing logs are also available;

Have a great time!

codecov-commenter · 2026-04-22T11:58:09Z

Codecov Report

❌ Patch coverage is 78.76712% with 31 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.58%. Comparing base (da8604a) to head (74748b8).

Files with missing lines	Patch %	Lines
frac/processor/search.go	71.26%	22 Missing and 3 partials ⚠️
frac/sealed/seqids/provider.go	70.00%	2 Missing and 1 partial ⚠️
frac/sealed_index.go	72.72%	2 Missing and 1 partial ⚠️

Additional details and impacted files

@@                Coverage Diff                 @@
##           329-batching-1     #406      +/-   ##
==================================================
- Coverage           71.54%   70.58%   -0.97%     
==================================================
  Files                 220      221       +1     
  Lines               16568    20423    +3855     
==================================================
+ Hits                11854    14415    +2561     
- Misses               3840     5128    +1288     
- Partials              874      880       +6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

cheb0 · 2026-04-22T12:12:21Z

@seqbenchbot down e8eefca9

seqbenchbot · 2026-04-22T12:12:29Z

Nice, @cheb0 <(-^,^-)=b!

The benchmark with identificator e8eefca9 was finished.
I've prepared a summary for you. Click on Show summary button to see it:

Show summary

Query	Type	`mean (ms)`			`stddev (ms)`			`p(50) (ms)`			`p(95) (ms)`			`p(99) (ms)`			`iterations`
Query	Type	base	comp	diff	base	comp	diff	base	comp	diff	base	comp	diff	base	comp	diff	base	comp	diff
`bulk`	warm	`65.92`	`67.00`	`+1.65%`	`25.11`	`26.77`	`+6.60%`	`58.00`	`60.00`	`+3.45%`	`118.00`	`124.00`	`+5.08%`	`157.50`	`165.00`	`+4.76%`	`2450.00`	`2450.00`	`0.00%`
`service:payment-backend-eu AND k8s_namespace:prod`	warm	`130.57`	`129.30`	`-0.97%`	`115.82`	`112.04`	`-3.26%`	`114.00`	`115.00`	`+0.88%`	`324.00`	`319.00`	`-1.54%`	`669.50`	`642.50`	`-4.03%`	`8339.00`	`8363.00`	`+0.29%`

Have a great time!

forshev · 2026-05-04T10:09:54Z

+	for _, lid := range lids {
+		rawLid := lid.Unpack()
+		blockIdx := p.table.GetIDBlockIndexByLID(rawLid)
+		if p.midCache.blockIndex != int(blockIdx) {


nit: fillMIDs has this check inside. did you add it to avoid function call?

forshev · 2026-05-04T12:09:10Z

+		// Get MIDs
+		if needMids > 0 {
+			timerMID.Start()
+			mids = idsIndex.GetMIDs(lidsSlice[0:needMids], mids[:0])


nit: technically we can omit the lower bound if it equals 0

lidsSlice[:needMids]

dkharms · 2026-05-12T12:42:59Z

 	return seq.MID(p.midCache.GetValByLID(uint32(lid))), nil
 }

+func (p *Provider) MIDs(lids []node.LID, out []seq.MID) ([]seq.MID, error) {


Why Provider has method for retrieving a batch of MID but there is no similar method for RID?

dkharms · 2026-05-12T12:55:59Z

+	defer searchBuffersPool.Put(buffers)
+	mids := buffers.mids
+	rids := buffers.rids
+	lidsBuffer := buffers.lids


Shouldn't you reset buffers since slices are reused?

dkharms · 2026-05-12T13:00:09Z

+		lidsBuf := lidsBuf{
+			lids: make([]node.LID, 0, consts.LIDBlockCap),
+		}
+		return searchBuffers{


It's better to return a pointer here, otherwise there will be unnecessary allocations since any is returned.

dkharms · 2026-05-12T13:07:47Z

+	filterMIDs := sw.Timer("filter_mids")
+	updateHist := sw.Timer("update_hist")


Suggested change

filterMIDs := sw.Timer("filter_mids")

updateHist := sw.Timer("update_hist")

timerFilterMIDs := sw.Timer("filter_mids")

timerUpdateHist := sw.Timer("update_hist")

dkharms · 2026-05-12T13:11:45Z

 }

 type LIDsIter interface {
 	Lids(out []node.LID) []node.LID


Suggested change

LIDs(out []node.LID) []node.LID

dkharms · 2026-05-12T15:33:07Z

I'll leave it here since it is out of scope of this diff.

Take a look at https://github.com/ozontech/seq-db/blob/329-batching-iterate-eval-tree/frac/sealed/lids/iterator_desc.go#L121-L131 -- I guess you've introduced code duplication while performing rebase.

dkharms · 2026-05-12T15:37:49Z

+	return total, ids, hist, aggs, nil
+}
+
+func filterOutOfRangeMIDs(params SearchParams, mids []seq.MID, lidsSlice []node.LID) ([]seq.MID, []node.LID) {


I am not sure what purpose this function serves.

Per my understanding, we cannot iterate over seq.LID which correspond to seq.ID that lie outside of user-requested range [from; to] -- this is guaranteed because we calculate minLID and maxLID in getLIDsBorders and use those in all iterators to set boundaries.

Am I missing something?

dkharms · 2026-05-12T15:42:47Z

+	buffers := searchBuffersPool.Get().(searchBuffers)
+	defer searchBuffersPool.Put(buffers)
+	mids := buffers.mids
+	rids := buffers.rids


Starting a petition to protect Vim users and their descendants — we require spaces. This is how we navigate code. Thank you for your cooperation.

Maybe something like?

var ( total int lastID seq.ID ids seq.IDSources ) buffers := searchBuffersPool.Get().(searchBuffers) defer searchBuffersPool.Put(buffers)

dkharms · 2026-05-12T15:50:48Z

 		}
 		// limit how much we drain from eval tree for one-by-one flow. ignored for batched flow
-		need = min(need, maxLidsToDrain)
+		needLids = min(needLids, maxLidsToDrain)


Maybe we can move this whole thing with calculating limits/offsets/etc to the batch? I mean something like:

if ok { evalTreeIter = func(need int, _ lidsBuf) LIDsIter { // batched flow: juts get a batch and return return batchNode.NextBatch().Trim(need) // Or return batchNode.NextBatch(need) } } else { ... } func (b LIDBatch) Trim(k int) LIDBatch { b.lids = b.lids[:min(k, len(b.lids))] return b }

batch processing for iterateEvalTree

74748b8

cheb0 changed the base branch from main to 329-batching-1 April 22, 2026 11:55

cheb0 added 2 commits April 23, 2026 15:44

linter issues

afb477f

sync.pool for all buffers

b62a56b

cheb0 marked this pull request as ready for review April 23, 2026 14:19

eguguchkin requested review from dkharms and forshev April 27, 2026 11:03

forshev approved these changes May 4, 2026

View reviewed changes

cheb0 added the performance Features or improvements that positively affect seq-db performance label May 12, 2026

dkharms reviewed May 12, 2026

View reviewed changes

Merge branch '329-batching-1' into 329-batching-iterate-eval-tree

a92bb18

		filterMIDs := sw.Timer("filter_mids")
		updateHist := sw.Timer("update_hist")

Conversation

cheb0 commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

cheb0 commented Apr 22, 2026

Uh oh!

seqbenchbot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Apr 22, 2026

Codecov Report

Uh oh!

cheb0 commented Apr 22, 2026

Uh oh!

seqbenchbot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dkharms May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dkharms May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

cheb0 commented Apr 22, 2026 •

edited

Loading

seqbenchbot commented Apr 22, 2026 •

edited

Loading

seqbenchbot commented Apr 22, 2026 •

edited

Loading

dkharms May 12, 2026 •

edited

Loading

dkharms May 12, 2026 •

edited

Loading